96 research outputs found
TRIQ: A Comprehensive Evaluation Measure for Triclustering Algorithms
Triclustering has shown to be a valuable tool for the analysis
of microarray data since its appearance as an improvement of classical
clustering and biclustering techniques. Triclustering relaxes the
constraints for grouping and allows genes to be evaluated under a subset
of experimental conditions and a subset of time points simultaneously.
The authors previously presented a genetic algorithm, TriGen,
that finds triclusters of gene expression dasta. They also defined three
different fitness functions for TriGen: MSR3D, LSL and MSL. In order
to asses the results obtained by application of TriGen, a validity measure
needs to be defined. Therefore, we present TRIQ, a validity measure
which combines information from three different sources: (1) correlation
among genes, conditions and times, (2) graphic validation of the patterns
extracted and (3) functional annotations for the genes extracted.Ministerio de Ciencia y Tecnología TIN2011-28956-C02-02Ministerio de ciencia y Tecnología TIN2014-55894-C2-1-RJunta de Andalucía P12-TIC-752
LSL: A new measure to evaluate triclusters
Microarray technology has led to a great advance
in biological studies due to its ability to monitorize the RNA levels
of a vast amount of genes under certain experimental conditions.
The use of computational techniques to mine hidden knowledge
from these data is of great interest in research fields such as
Data Mining and Bioinformatics. Finding patterns of genetic
behavior not only taking into account the experimental conditions
but also the time condition is a very challenging task nowadays.
Clustering, biclustering and novel triclustering techniques offer
a very suitable framework to solve the suggested problem. In
this work we present LSL, a measure to evaluate the quality of
triclusters found in 3D data
MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data
Microarray technology is highly used in biological research environments due to its ability to monitor the RNA concentration levels. The
analysis of the data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to
create groups of genes that exhibit a similar behavior. Biclustering relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of
the conditions. Triclustering appears for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time
points. These triclusters provide hidden information in the form of behavior patterns from temporal experiments with microarrays relating subsets of genes,
experimental conditions, and time points. We present an evaluation measure for triclusters called Multi Slope Measure, based on the similarity among the
angles of the slopes formed by each profile formed by the genes, conditions, and times of the triclusterMinisterio de Ciencia y Tecnología TIN2011-28956-C02-02Junta de Andalucía TIC-752
Rationale for Timing of Follow-Up Visits to Assess Gluten-Free Diet in Celiac Disease Patients Based on Data Mining
The assessment of compliance of gluten-free diet (GFD) is a keystone in the supervision of
celiac disease (CD) patients. Few data are available documenting evidence-based follow-up frequency
for CD patients. In this work we aim at creating a criterion for timing of clinical follow-up for CD
patients using data mining. We have applied data mining to a dataset with 188 CD patients on
GFD (75% of them are children below 14 years old), evaluating the presence of gluten immunogenic
peptides (GIP) in stools as an adherence to diet marker. The variables considered are gender, age,
years following GFD and adherence to the GFD by fecal GIP. The results identify patients on GFD for
more than two years (41.5% of the patients) as more prone to poor compliance and so needing more
frequent follow-up than patients with less than 2 years on GFD. This is against the usual clinical
practice of following less patients on long term GFD, as they are supposed to perform better. Our
results support different timing follow-up frequency taking into consideration the number of years
on GFD, age and gender. Patients on long term GFD should have a more frequent monitoring as
they show a higher level of gluten exposure. A gender perspective should also be considered as
non-compliance is partially linked to gender in our results: Males tend to get more gluten exposure,
at least in the cultural context where our study was carried out. Children tend to perform better than
teenagers or adultMinisterio de Economía y Competitividad TIN2017-88209-C2-2-RJunta de Andalucía US-126334
Modeling Genetic Networks: Comparison of Static and Dynamic Models
Biomedical research has been revolutionized by high-throughput
techniques and the enormous amount of biological data they are able to
generate. The interest shown over network models and systems biology is
rapidly raising. Genetic networks arise as an essential task to mine these data
since they explain the function of genes in terms of how they influence other
genes. Many modeling approaches have been proposed for building genetic
networks up. However, it is not clear what the advantages and disadvantages of
each model are. There are several ways to discriminate network building
models, being one of the most important whether the data being mined presents
a static or dynamic fashion. In this work we compare static and dynamic models
over a problem related to the inflammation and the host response to injury. We
show how both models provide complementary information and cross-validate
the obtained results
Deep Learning Techniques to Improve the Performance of Olive Oil Classification
The olive oil assessment involves the use of a standardized sensory analysis according
to the “panel test” method. However, there is an important interest to design novel
strategies based on the use of Gas Chromatography (GC) coupled to mass spectrometry
(MS), or ion mobility spectrometry (IMS) together with a chemometric data treatment
for olive oil classification. It is an essential task in an attempt to get the most robust
model over time and, both to avoid fraud in the price and to know whether it is suitable
for consumption or not. The aim of this paper is to combine chemical techniques and
Deep Learning approaches to automatically classify olive oil samples from two different
harvests in their three corresponding classes: extra virgin olive oil (EVOO), virgin olive oil
(VOO), and lampante olive oil (LOO). Our Deep Learning model is built with 701 samples,
which were obtained from two olive oil campaigns (2014–2015 and 2015–2016). The
data from the two harvests are built from the selection of specific olive oil markers from
the whole spectral fingerprint obtained with GC-IMS method. In order to obtain the
best results we have configured the parameters of our model according to the nature
of the data. The results obtained show that a deep learning approach applied to data
obtained from chemical instrumental techniques is a good method when classifying oil
samples in their corresponding categories, with higher success rates than those obtained
in previous works.Ministerio de Economía y Competitividad TIN2017-88209-C2-2-
Revisiting the Yeast Cell Cycle Problem with the Improved TriGen Algorithm
Analyzing microarray data represents a computational
challenge due to the characteristics of these data.
Clustering techniques are widely applied to create groups of
genes that exhibit a similar behavior under the conditions
tested. Biclustering emerges as an improvement of classical
clustering since it relaxes the constraints for grouping allowing
genes to be evaluated only under a subset of the conditions
and not under all of them. However, this technique is not
appropriate for the analysis of temporal microarray data in
which the genes are evaluated under certain conditions at
several time points. On a previous work we presented the
TriGen algorithm, a genetic algorithm that finds triclusters
of gene expression that take into account the experimental
conditions and the time points simultaneously, and was applied
to the yeast (Saccharomyces Cerevisiae) cell cycle problem.
In this article we present some improvements on the genetic
algorithm and we also present the results of applying the
improved TriGen algorithm to the yeast cell cycle problem,
where the goal is to identify all genes whose expression levels
are regulated by the cell cycle
Triclustering on TemporaryMicroarray Data using the TriGen Algorithm
The analysis of microarray data is a computational
challenge due to the characteristics of these data.
Clustering techniques are widely applied to create groups of
genes that exhibit a similar behavior under the conditions
tested. Biclustering emerges as an improvement of classical
clustering since it relaxes the constraints for grouping allowing
genes to be evaluated only under a subset of the conditions
and not under all of them. However, this technique is not
appropriate for the analysis of temporal microarray data in
which the genes are evaluated under certain conditions at
several time points. In this paper, we propose the TriGen
algorithm, which finds triclusters that take into account the
experimental conditions and the time points, using evolutionary
computation, in particular genetic algorithms, enabling the
evaluation of the gene’s behavior under subsets of conditions
and of time points
Fusion of Domain Knowledge for Dynamic Learning in Transcriptional Networks
A critical challenge of the postgenomic era is to understand how
genes are differentially regulated even when they belong to a given network.
Because the fundamental mechanism controlling gene expression operates at
the level of transcription initiation, computational techniques have been devel oped that identify cis-regulatory features and map such features into differential
expression patterns. The fact that such co-regulated genes may be differentially
regulated suggests that subtle differences in the shared cis-acting regulatory
elements are likely significant. Thus, we carry out an exhaustive description of
cis-acting regulatory features including the orientation, location and number of
binding sites for a regulatory protein, the presence of binding site submotifs, the
class and number of RNA polymerase sites, as well as gene expression data,
which is treated as one feature among many. These features, derived from dif ferent domain sources, are analyzed concurrently, and dynamic relations are re cognized to generate profiles, which are groups of promoters sharing common
features. We apply this method to probe the regulatory networks governed by
the PhoP/PhoQ two-component system in the enteric bacteria Escherichia coli
and Salmonella enterica. Our analysis uncovered novel members of the PhoP
regulon as and the resulting profiles group genes that share underlying biologi cal that characterize the system kinetics. The predictions were experimentally
validated to establish that the PhoP protein uses multiple mechanisms to control
gene transcription and is a central element in a highly connected network.Ministerio de Ciencia y Tecnología BIO2004-0270-
Optimization of multi-classifiers for computational biology: application to gene finding and expression
Genomes of many organisms have been sequenced over the last few years. However, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed to address part of this problem: the location of genes along a genome and their expression. We propose a multi-objective methodology to combine state-of-the-art algorithms into an aggregation scheme in order to obtain optimal methods’ aggregations. The results obtained show a major improvement in sensitivity when our methodology is compared to the performance of individual methods for gene finding and gene expression problems. The methodology proposed here is an automatic method generator, and a step forward to exploit all already existing methods, by providing alternative optimal methods’ aggregations to answer concrete queries for a certain biological problem with a maximized accuracy of the prediction. As more approaches are integrated for each of the presented problems, de novo accuracy can be expected to improve further.Ministerio de Ciencia y Tecnología TIN2006-12879Junta de Andalucía TIC-0278
- …